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Abstract 

In this note, we consider the problem of existence of adaptive confidence bands in the 
fixed design regression model, adapting ideas in Hoffmann and Nickl [10] to the present 
case. In the course of the proof, we show that sup-norm adaptive estimators exist as well in 
regression. 

1 Introduction 

We observe random variables Yi's and assume that, for n £ N, 

Y i = f(x i )+e i , l<i<n (1) 

where xi = j-, £j ~ M(0,a 2 ) are independent and identically distributed random variables de- 
fined on (f2, J 7 , P), and / is the unknown regression function. We further assume the variance 
a 2 is known. Our aim is to reconstruct / from the sample. 

Let us fist define the parameter space. We assume / belongs to some Holder space. The 
Holder space C l for < t < 1 is the space of continuous functions / on [0, 1] such that 

c^ll/lloc + sup 17 ^"^ 1 <oo if 0<t<l 
x ¥ z y \^ y\ 

\f(x + y) + f(x-y)-2f(x)\ 
sup — < oo if t = 1. 

*H \y\ 

If t > 1, C* is the space of functions such that the -tti derivative of / exists, is continuous and 
belongs to C'~L*J if t is not an integer, and such that the t-th derivative of / exists, is continuous 
and is in C 1 otherwise. Define then = \\f^ ||c*-L*J • Therefore, for all t, \\f\\ct is a norm of /. 



Working with this definition may prove difficult, and we thus use the wavelet basis char- 
acterisation of C l : whether or not / belongs to it depends on the size of the coefficients of 
its decomposition over this basis. We use Daubechies wavelets on the unit interval, as in [10]. 
Denote, accordingly, by <p an d ip the scaling functions, <j) m = </)(■ — m), ipj m = 2 j / 2 t/>(2 j • — m) 
and, for < -, • > the usual inner product in L 2 ([0, 1]), 

||/||t,oo = max ( sup | < (£ m ,/ > |, sup2 J ' (s+1/2) | < rp jm J> I I • 
\ m i> m / 

Now, Theorem 4.4 in [5] gives the equivalence : for t > 0, C is the set of continuous functions 
on [0, 1] such that ||/||t,oo < 00 • Moreover, ||/||t )0 o is a norm equivalent to ||/||c*- 



1 



For fixed B > 0, define 



m = f 



t,oo 



< B 



One has typically no knowledge of the regularity of /. But estimators which achieve the optimal 
risk in sup-norm loss for whatever t have been constructed. They are called adaptive, and 
exist in the density and the white noise cases, as shown respectively by Gine and Nickl [8] 
and Goldenshluger and Lepski [9]. Now, Brown and Low [2] have proven that the white noise 
and regression cases are asymptotically equivalent, and therefore one may legitimately expect 
adaptive estimators to exist as well in the latter. For the sake of reference we prove it in 
section 3.2.2, that is, we prove the following theorem, where 

, , /logn\'/( 2t+1 ) 
r n (t) = -2- t > 0. 



n 

Theorem 1. Let Y\, . . . , Y n verify Yi = f(i/n) + £j where the E{ 's are i.i.d. M(0, a 2 ). 

Then, for every integer I > 0, there exists an estimator f n {x) := f n (x, Y±, . . . , Y n , I) and an 

integer hq such that, for every t, < t < I, some constant D(B, I) and every n > uq, we have 

sup £/!!/„- /Hoc <D(B,l)r n (t). 
/es(t) 

We use the following notations. For any random variable X : — > M. n measurable with 
respect to c(Yi, Y n ) and every function / G S(r), Ef(X) is the expectation of X when the 
function in Equation 1 is /. For all / G E(r), and all A G B(R n ), P f ( (Yi, . . . , Y n ) G A) = 
E f [l A (Y u ...,Y n )]. 

Thanks to these estimators, one may then want to construct confidence bands for /, that is 
data-driven sets which cover / at all points simultaneously. 

Definition 1. A confidence band is a family of random intervals 

C n = C(Y lt ...,Y n ) = {[c n (y),c> n (y)} } yem ■ 
Define the diameter of C n , \C n \ = sup^^] \c' n (y) - c n (y)\. 

For instance, Claeskens and Van Keilegom [4] construct such bands for the function and its 
derivatives. However, just as with estimators, one would like the diameter of the band to be 
optimal for all t, that is : 

Vt > 0, sup E f \C n \< Lr n (t) 
/e£(i) 

for some constant L independent of t. Here, we restrict ourselves to the case where / belongs 
to S(s) or S(r) for < r < s given. Now, results by Low [13], by Genovese and Wasserman [6], 
and also our findings below, imply that to construct such sets is not possible for U S(r). To 
circumvent this problem, two paths have recently been considered. Both restrict the parameter 
space, so that such negative results no longer apply, but at the same time the parameter space 
remains as large as possible. The one studied by Gine and Nickl [7] in the density model consists 
in removing permanently some functions from the model, according to a self-similarity condition. 
Bull [3] then extends their results, in particular to the white noise model. Hoffmann and Nickl 
[10] on the other hand let the parameter space evolve with n. In this article, we follow their 
approach. 
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Denote d(f, E) = inf sGS ||/ 

— fflloo the distance which derives from the sup-norm. For p n > 0, 

define 



E(r,A0 = {/€E(r) E( S )) > p n } 



Consider the model 

V n = E(s)UE(r,p n ), 

for < r < s. 

Definition 2 (Honesty). T/ie confidence band C n is called asymptotically honest with level 
< a < 1 /or "Pn if it satisfies the asymptotic coverage inequality 

liminf inf P f (f £ C n ) > 1 - a. 

Definition 3 (Adaptivity). T/ie confidence band C n is called adaptive overV n if there exists a 
constant L such that for every n£N, 

sup Ef\C n \ < Lr n (s), sup E f \C n \ < Lr n (r). 

/es(s) /es(r,p„) 

One is in turn interested in finding the smallest possible seqence p n . Hoffmann and Nickl 
provide a lower bound for p n which is sharp as they are able to construct confidence bands for 
p n of the order of this lower bound. The present article adapts their proofs to the regression 
model. 

To compute the lower bound, Hoffmann and Nickl reduce to a testing problem whose minimax 
rate of testing is p n , and we proceed likewise. To construct confidence bands, we use adaptive 
estimators, which existence we prove, as well as a concentration inequality for a certain gaussian 
process. 

We first give the main result, and carry on with its proof. It relies on some additional results, 
which are discussed in a subsequent section. 

2 Existence of adaptive and honest confidence bands 
2.1 The main result 

We may now give a precise statement of the existence of confidence bands. 
Theorem 2. Let s > r > and B > be given. 

• Assume r > 1/2 and let a < 1/2. Suppose C n is a confidence band that is honest with 
level a over V n , and adaptive. Then necessarily, 

liminf-^- >0. (2) 

n r n (r) 

• Let < a < 1. There exists a sequence p n satisfying 

lim sup ^" < oo (3) 
n r n (r) 

and a confidence band C n that is honest with level a over V n , and adaptive. 

The first part of the theorem means that the crown removed around E(s) has to be large 
enough. Indeed, there are functions in E(r) we cannot distinguish (as Hoffmann and Nickl [10] 
explain) from those of E(s), and they must thus be excluded from the model, as detailed in the 
next section. As explained in [10], the sets constructed cannot be easily computed. In the next 
two sections we prove this theorem, using auxiliary results we discuss later. 
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2.2 Proof of the lower bound 

In all what follows, (jn)n>o is a positive sequence such that 

r 

'log n\ 2H- 1 



~ 



n 



where ~ means that the two sequences involved are dominated by each other. Before starting 
the proof, we define a certain set we use in it, then precise the notations we use for tests. 

For j big enough and any m, ipj m is supported in the interior of [0, 1]. Furthermore, since ip 
has a compact support, denote [a, b] D supp(ip) and Cq 1 = \b — a] . Now, the 

^mc - 1 . m=1 . C (2 J - 1) 

have disjoint supports. Note that Co depends only on tp. 

Let (j n )n>o be a positive sequence tending to oo. For all 1 < m < co(2 Jn — 1), let 

f m = 2 -Mr+l/2)^ _ 1 _ 

(f m depends on n but since it is apparent that the set M. n depends on n, we do not repeat it 
for f m so as to simplify notations.) Define 

M n (j n ) = {/l, -,/ C0 (2Jn_i)}. 

■M n (j n ) is a subset of E(r) and, under suitable choices of j n and p n precised below, it is in 
fact a subset of S(r, p n ). 

The existence of confidence bands for V n is related to the possibility of testing accurately 
the alternative : 

H : / = against H\: f E M n {j n ). (4) 
A test T n based on a sample of size n is any 

T n : n -> {0, 1} 

which is measurable with respect to <r(li, 1^), the a-algebra generated by Yi,...,Y n . We 
denote by T n the set of all tests T n . 

To assess the quality of the test T n designed to solve the testing problem 4, we use the sum 
of the errors of first and second type : 



r(T n ,j n ) = E fo (T n ) + sup E f (l - T n ), 

feMn(jn) 



where /o = identically. 



We need the two following results. The first one shows that for 2~ jf ™ r <^ (^pO 2r+1 ; we 
cannot test efficiently /o againt M n (j n ). Now, 2~ Jnr is intuitively the distance in the sup-norm 
between M. n (j n ) and As a result, Proposition 1 means that close to we find sets 

which cannot be statistically distinguished from it. The proof is in section 3.1. 

Proposition 1. Let (j n )n>o be any sequence satisfying 2~ JnT ' = o{2~ 3n ) as n — > oo. Then 

liminf inf r(T n , j n ) > 1. 

n^oo T n eT„ 
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Once we are given a confidence band, we can construct an obvious test to decide the testing 
problem 4. Let 

We accept /o if no f m is in C n and reject otherwise. If j n satisfies certain conditions, T° is a rel- 
evant test for the testing problem 4. The problem is that it may be possible to find j n such that 
both these conditions are satisfied and 2~- ?nT ' <C (-^jp) 2r+1 , leading to a contradiction. Lemma 1 
indeed shows that if p n goes to too quickly, we may find j n such that both r(T*,j n ) — > and 
the conditions of Proposition 1 are satisfied. To remedy this, the exclusion zone around S(s) 
must contain all functions which prevent adaptation, and therefore p n must be big enough. 

Lemma 1. Assume lim n _ 5 . 00 p ? ■, = 0. Then there exists j n such that 

• ■M-n(jn) C S(r, p n ), for all n large enough, depending only on B,r and s. 

• r n {s)/2~ : i nT — > as n — > oo. 

• 2~ Jn ' r = o(2~i™ T ) as n — >• oo. 

Proof. Let j n = min(L-^gf j - 1, [~ r -^P\). Then 

max(2p n , r n (s) log ra) < 2~ jnT . 
First, for all 1 < m < 2-?" — 1, 

d(/m,S(*))> 2-^(1 -B2^ r - >) 

(as in Hoffmann and Nickl, in section 2.1, inequality (2.5)). Since p n is less than the right-hand 
side for all n large enough depending only on B, r and s, M n {jn) is included in E(r, p n ) for all 
n large enough. Then, 

r n(s) < r ra (s) _^ 
2~-?™ r ~~ r n (s) log n 

as n — >• oo. 

Finally, write j n = min(La n J, [b n \) ; a n ,b n — > oo as n — > oo. So, [flnj ~ an, but 

2" a " r = o(r n (r)) and 2~L a »J r < 2~ a » r 2 r so 2"L a «J r = (2" J >). 
Likewise, 2"L^> = o(2"^ r ). So 2~^ r = o(2^>). 

□ 

We now prove Proposition 2, which is equivalent to the first part of Theorem 2. We proceed 
by reductio ad absurdum. The main thing is to prove is consistent for the testing problem 4. 

Proposition 2. Assume liminf n _ >00 = 0. Then we cannot find an honest and adaptive 
confidence band of level a < 1/2 for V n . 

Proof. Assume otherwise. Take a subsequence of r p ^ which tends towards 0, and still note n 
the subscript. Take j n as in Lemma 1. We show first that r(T*,j n ) — > as n — > oo. 

£ /o (t„°) = p /o (c n nM(i n )^0) 

= P f0 ( C n n M n (j n ) ^ 0, fo € C,,') + P fo (C n n M ri (j n ) + 0, f $ C n 



< Pfo { \C n \ > d(Z( S ),M n (j n )) ) + P(/ C n ) 

" d(£( S ),M n (jn)) 1 J 



5 



thanks to Markov's inequality and honesty of C n . 
So 

Ef (T°) < 2 _f n l n { [ S) _ B 2h<r-*) + a + o(l) = a + o(l) 

thanks to Lemma 1 (second point) and adaptivity of C n . 
For f m G Mn(jn), 

E fm {i-T Q n ) = P fm (C n nM n (j n ) = $) 

< Pf m (fm$C n ) 

< SUp P f (J i C n ) 

fev n 

thanks to Lemma 1 (first point). So 

sup E fm (T n °) < sup P f (f i C n ) < a + o(l) 

f m eMn{jn) feV n 

by honesty of C n . Therefore we have 

limsup r(T^,j n ) < 2a < 1 

but 

lim inf inf r(T n ,j n ) > 1, 

n->oo T n eT„ 

thanks to Lemma 1 (third point) and Proposition 1, which is a contradiction. 

□ 

2.3 Proof of upper bound 
2.3.1 Construction of C n 

Let p n = Xr n (r), with A chosen below. We will estimate the function / to construct confidence 
bands. We need two different estimators. We use the local polynomial estimator of order I 

with h = (^p-) 2,+1 , fn(h), which we note f n in this section, defined in Theorem 3. Thanks to 
Equation 10, we know that for some b > 0, for all / E S(r), 

\\Eff n - /Hoc < br n {r) (5) 

/„ is the estimator given by Theorem 1. f n is adaptive over U S(r) with rate r n (-). In other 
words, for D = D(B,l), 

sup E f \\f n - /Hoc < Dr n {t) t = r,s. (6) 
/££(*) 

We are now able to construct the confidence band C n . The idea is to adapt the width of C n 
with the likelihood that / be in S(r) or in S(s). Define 

C^n = d(f n ,T,(s)). 

If (i n is greater than some constant times r n (r), Equation 5 shows that it is unlikely that / is in 
E(s) (intuitively, f n w Eff n ), and thus we choose f n ±Lr n (r). Elsewhere, we choose f n ±Lr n (s). 
Accordingly, define C n : 

f n ±Lr n (r) iid n >T and / ± Lr n (s) if d n < t, (7) 

where r = Kr n (r) and where At and L are constants chosen below. Let us now prove that C n is 
honest and adaptive. 
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2.3.2 Proof of honesty and adaptivity 

We first prove that C n is an honest confidence band for V n . Let < a < 1. If / G S(s) we have, 
using adaptivity of f n and Markov's inequality, 



inf P f (feC n ) > 1 - sup P f (\\f n - /Hoc > Lr n (s) 

/eE(s) /GS(s) v 

a '-I^s,*' 117 "-' 11 - 

> i-B 

L 

which is greater than 1 — a for L large enough. 

Now, if / G S(r, p n ), we have, using again Markov's inequality, 

inf P / (/€C B )>l- roP ^)^ l ' /w " /IU - sup P f (d n <r) 

and the first substracted term is smaller than a/2 for L large enough. Now, Pf(d n < r) equals, 
for every / G S(r, p n ), 

P / in f Jl/n-slU < «r n (r) ) < I inf ||/ - g||oo - ||/ n - ^//nlloo - H^/Zn - /||oo < Kr n (r) I 
\sgs(s) y ' \ 9 / 

< Pf(p n - \\Eff n - /lU - «r n (r) < ||/n " #//n||oo) 

< P/(||/n-%/«||oo>(A-K-6)r n (r)) 

< Cl n" 71 = o(l) 

thanks to Corollary 1, by choosing A large enough. This completes the proof of honesty of the 
band. Let us now deal with adaptivity. 
By definition of C n we have 

\C n \ < Lr n {r) 

so the case / G S(r, p n ) is immediate. If / G then, 

E f \C n \ < Lr n {r)P f {d n > r) + Lr n {s) 

< Lr n (r)P f inf \\f n - gW^ > Kr n (r) + Lr n (s) 

\ses(s) y 

< Lr n (r)P f ( \\f n - /IU > Kr n {r)^j + Lr n (s) 

< Lr n (r)P f ( \\f n - EffnWn > (k - b)r n (r)j + Lr n {s) 

< Lr n (r)c 2 n~' y2 + Lr n (s) = 0(r n (s)) 

thanks to Corollary 1, since 72 is sufficiently large if k is chosen large enough. 
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3 Auxiliary results 



The first section proves Proposition 1. It is the adaptation of the proof in Lepski and Tsy- 
bakov [ I 2]. Lemma 2 is a standard procedure to lower bound the error ; Lemma 3 and Lemma 4 
deal with the functions in M. n (j n ), and Lemma 5 uses them all to give the required bound, 
thanks to Bahr-Esseen's inequality [16]. The second section presents the estimators used to 
contruct the confidence bands and to prove the existence of adaptive estimators. The third 
section deals with the concentration inequality used in proving the bands are indeed honest and 
adaptive, and that the estimator we construct is indeed adaptive. 

3.1 Proof of Proposition 1 

Throughout this section, T n is any test, (j n )n>o is a non negative real sequence such that 
2~Jnr _ (2-i.* r ) as n ->■ oo, and M = c (2 jn - 1). Recall the definition of M n (jn) and that 
/o = 0. 

Lemma 2. For all < rj < 1, 



r(T n ,j n ) >(l-ri)p(±YlZrn>l-v), 



where £ m = ^(ei, -,£ n ) and P m = P frt 
Proof. 



r(T n ,j n ) = E fo (T n )+ sup Ef(l-T n 

feM n (j n ) 



^ M 



> E f0 (T n ) + -Y, E f m ( 1 - T n) 



m=l 



> E f0 (T n )+E f0 ((l-T n )Z n ) 



where 



Let < T) < 1. 



1 dPrr 



So, 



Zn ~ M ^lP^ (Yu -' Yn) - 



If T n = 1 then T n + (1 - T n )Z n = l>l-r) 
else T n + (1 - T n )Z n = Z n . 



T n + (l-T n )Z n > (T n + (l-T n )Z n )l Zn >i- v 
> (! - ri)l Zn >i-r,- 



As a result, 



r n (T n ,jn) > (l-v)Pf (Zn>l-V) 



1 ^ d P„ 



m=l 



□ 



s 



We want to show that P {^jj J2m=i £m > 1 — Vj — > 1 as n — > oo. It will be sufficient to 
conclude because this quantity is independent of T n . To do this, note that 

\ m=l / \ m=l / 



Now, 



/ 1 M \ /i M 

\ m=l / \ m=l 



1 < -7? 



< P 



M 



m=l 



> r) 



and we therefore want to prove that the last quantity tends towards 0. 
Lemma 3. For 1 < m < M, 



with Cm ~ AA(0, 1) i.i.d. and 



sm — exp I Cm „ 

\ o" at, 



i=l 



(As for the f m 's, we do not make explicit the dependence on n of a m .) 
Proof. 

u = nex P (-^-- /m(xi))2 -^ 



II ex P 

i=l 



2a 2 
2V 2 



1 



exp ( ^ E fm(Xi)(2Ei - f m {Xi)) 



i=l 



eX P ( Yl f™( x i)£i ~ ^2 E(^ m ( Xi )) 



i=l 



2a 2 



i=l 



Now, define 



Therefore, 



1 " 



XijEj. 



i=l 



a m 2a 2 
£ m = exp( Cm H- 



v 2 



Cm is a gaussian random variable as a linear combination of independent gaussian random 
variables, and straightforward computations show that Cm ~ AA(0, 1). 

To prove that the Cm's are independent, we first note that the vector £i, Cm is a gaussian 
vector. Indeed, any linear combination of its coordinates is a linear combination of the e^s 
and consequently a gaussian random variable as above. Thus, it is sufficient to prove that the 
covariances all equal 0. 
But, for m ^ m', 



Cov(C, m , C m ') = {a 2 a m a m t) 1 ^ fm{Xi)f m '( x i) = 
since the / m 's supports are not overlapping. 



i=l 



□ 
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Lemma 4. For all 1 < m < M, 

a ^ = n2-^( 2r+1 )+0(2^( 1 - 2r )) 
with the last term independent of m. 

Proof. We drop the n subscript in j n so as to simplify notations. Recall supp(i/jj m ) C [0, 1]. 
First, 



lb n -t I i> r. 

y^t^-m)- ^ 2 (t)dt = -Y 2^ 2 (2 J Xi-m) -2? / ^ 2 {2H-m)dt 

i=l •/» -/[0,1] 

n „1 

2i E Li (V 2 ( 2i;z; ; - m ) - ^ 2 ( 2 ^ 

— m)J (it 
V'(2 J x i - m) + ^(2 J 't - m)^ eft 



< 2 2 ^M^2|H|oo=^— ■ 
n n 



Here we use the mean value theorem and the fact that for all i S [l,n] and t E [- — , - 



\t - xA < 
Therefore, 



n „ 
V 2> 2 (2J'xi-m) =n / ^ 2 (t) di + 0(2 2 ^') 



and recall f^ip 2 (t) dt = 1. 
Now, 



' J jm( x i) 



2 - i( 2r+l)^^ 
1=1 

n 

2 -J'(^+ 1 )y 2> 2 (2 J Xi -m) 
i=i 

( n + 0(2 2 ^)) 

„2-iPr+l) + O ( 2 ^( 1 - 2 '-)) . 



Lemma 5. Lei < t> < 1. 



P 



M 



m = 1 



> rj] < ci 



exp(c 2 n2-^( 2r+1 )) 
(2*» - 1)« 



□ 



where c\ and c 2 are positive constants. 

Proof. First note that for all 1 < m < M, E(£ m ) = 1, so we have, by Markov's inequality 
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p 



M 



M 



£6n"l 



m=l 



> Tj 



P 



P 



P 



M 



M 



E(^-^(U)) 



m=l 



> r] 



M 



£(&»-£(&*)) 

71=1 

M 

£(e™-£(fm)) 



< 



m=l 



e[\E%=i(U-e(U)) 



> Mr] 

l+v 



> (Mr]) 



l+v 



l+v 



(Mr]) 



l+v 



Now, Bahr and Esseen recall that, for 1 < r < 2 and x, y complex numbers, \x + y\ r + \x — y\ r < 
2(|x| r + |y| r ). This, and the inequality which bears their names give 



E 



M 



m=l 



m=l 
M 



< kJ2e(\u\ 1+v + i 

m=l 

M 



( M 
\m=l 



in 



\l+v 



+ M 



which leads us to 



P 



M 



M 



££™-i 



iii=i 



>r] \ <K 



'E%=iE(\U l+v ) + i 



(Mr]) 1+V M v r] 1+V 



But, thanks to Lemma 4, we have 



E(\U\ 1+V ) = exp^v + v 2 ^ 

= exp (J-^(v + v 2 ) (n2-^( 2r+1 ) + O (2^( 1 " 2 ^))) 

= exp (J^{v + v 2 ) (n2-^( 2r+1 ))) exp (o (a^ 1 -*))) , 

with exp(0(2- ?n ( 1_2r )) bounded independently of m thanks to the same lemma by a constant K' , 
because r > 1/2. 



Now, Equation 8 leads to 



P 



1 



M E 1 



m=l 



> T]\ < K 



< K 



( MK'exp (^(v + v 2 ) (n2"- J "( 2r+1 ))) 1 
(Mr]) l + V 

'K'exp Up(v + v 2 ) (n2-i^ 2r+l ' 



M v r] 1+V 
1 



M v r] 1+V 



M v r] 1+V 



< Kmax(K',l) 



exp 

c v (2^ - l) v r] l + v 
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since 1 < exp i£s(v + v 2 ) [n2~^ 2r+1 ^ J . □ 
As a consequence, we just have to plug in the "value" of j n in 

exp (c 2 n2-J»( 2r+1 )) 



P 



j M 

M ^ U ~ 1 

m=l 



> 7/ < Cl- 



(2-?" - ly 



Now, 2~^ r = o(2"# r ) as n -»• oo. But 2 _ # r ~ (lapjaq* so 2~^ 2r+ ^ = o(^) asn^oo and 

i 

therefore the argument of the exponential is negligible in front of logn. Since 2 3n > (i^|~^) 2r+1 > 
the ratio tends towards as n — > oo. The discussion before Lemma 3 shows this completes the 
proof of Proposition 1. 

3.2 Estimators 

3.2.1 Local polynomial estimator 

We fix an integer I > s. t is any number such that < t < I. In this section, E stands for Ef. 
Reading through the proofs of Proposition 1.13 and Theorem 1.8 in Tsybakov [15], we get the 
following. 

Theorem 3. There exist tlq G N, c\ > such that, for all n > uq, for all (logn/n) 1 ^ 2 ^ 1 ^ > 
h > log njn, there exist functions x E [0,1] i— >■ W n i(h,x), 1 < i < n verifying : 

• ^Vi, x \W ni {x)\ < ^ 

• Zi=i\W ni (x)\ <ci 

• W n i(x) =0 zf l^j — x\ > h 

• x i — ^ W n j(a:) is continuous. 

There exist further c 2 > swc/i i/iai, /or all < t < I, for all f 6 i/ie /oca/ polynomial 

estimator of order I, 

n n 

f n {h)(x) = J2 W ni {h, x)Yi = J2 Wni(x)Yi (9) 

i=l i=l 

satisfies : 



\\Ef n {h) - /|U < Wh*L h t = B (h, f) (10) 
E\\f n (h) - EUml < c 2 ^ = 4a\h,n). (11) 

logn\ 2t+i 



For /i = /i n (i) = y-^f^j , this implies that f n {h) satisfies 

logn\ 2*+T 



sup E f \\f n {h n (t)) - fWco < D 

/6S(t) 



with D depending on B and /. This means that f n {h n {t)) is rate optimal over E(i). Thanks to 
it, we shall now construct adaptive estimators, rate optimal over a range of regularity indexes, 
by proving Theorem 1. 
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3.2.2 Proof of Theorem 1 : existence of adaptive estimators 

The proof is an adaptation of that of Gine and Nickl [8] in the density case, which uses Lepski's 
method. For each < t < /, we have an estimator f n (h n (t)) rate optimal over We want 

to devise a procedure which allows us to choose h n according only to the data, such that if 
/ € S(t), h will be roughly h(t). Lepski's method consists in discretising the set of all possible 
bandwiths, and choosing h n so that, for all h ~ h n (t) < h n , the distance between f n (h) and 
f n (h n ) is of the order the optimal rate for r n (t). 



Fix p > 1. Define 



n = lp- k 



k>o, ^>^) 2 



71 



and 

1/2' 



h n = max <h G T~L 



Vg < h, WfnCK) - f n (g)\\oc < I (12) 



with M = 16(\/2aciK + C2) 2 ; K is choosen later. \%\ ~ logn. 

Indeed, for all t > 0, < h n (t) < 1 and we choose (logn) 2 for practical reasons (this choice 
is not restrictive since indeed, what matters in this ratio is the power of n, not that of the 
logarithm) . 

The adaptive estimator is f n = f n (h n ) but we will write f n (h n ) in the proof. Fix now t > 
and / G S(i). Define 



hf = max IheH 
which verifies, thanks to Theorem 3, 

'logn 



n 



We now bound E\\f n (h n ) — /||oo- We distinguish the cases {h n > hf} and {h n < hf}. 
First, 

E\\ fn(h n ) - f\\ocIh n > hf 

< e (\\UK) - Uh f )U + \\f n {hf) - Ef n {hf)U + \\Ef n (h f ) - f\\Jj i hn > hf 

< E\J2 \\fn{h)-fn{hf)\\ooI hn=h ) +E\\f n (hf)-Ef n (hf)\\ 00 + \\Ef n (hf)-fl 

\h>h, ] 



< VMa(h f ,n)P(h n > h f ) + E\\f n {hf) - Ef^hf)^ + \\Ef n (h f ) - f\\ c 



< \fMa(hf,n) + C2a(hf,n) -\ —a(hf,n) = 0(a(hf,n)) 



13 



Then, 



E\\f n (h n ) - flU^. 



= E E [WUK) - f\Uh n=h 

h£H,h<h f V 

< E ^(||/A)-^/n(M||oc + ||^/A)-/||oc)l^ =h 
hen,h<h f ^ ' 

< E (^(ll/A)-^/n(^)||oc^ n=/l ) +\\Ef n {h n )-f\\ 00 E(l hn=h ) 

< ]T ( c 2 (7 f^^,/) P (\ = fc) V2 ) + B(hf,f)P(h n < hf) 

< ^= E p(k = fc) 1/2 + *(*,,/). 

v s heH,h<h f 
Now, for h < hf, and writing /i + = p/i, 

P f/i n = /ij < P ^there is g < h, such that /i + and g do not satisfy the inequality 12 

'log 71 \ X / 2 \ 



< E P ( \\fn(h + ) - f n 
g<h,g£H V 



oo > VM 



71(? 



< £ P( ||/n(/i + )-/n(5)||oo> VM(T((7,7l) 
9<h,g&i 



For g < h, we have 



my/ oo 



ll/n(/i + ) " /, 

< \\f n (h + ) - Ef n (h + )\\ 00 + \\Ef n {h + ) - /IU 
+ ||/ - £/ n (<7)|U + ||P/ n (f/) - /n(ff)IU 

< 2P(/i /; /) + ||/ n (/i+) - Ef n (h + )\\ 00 + ||P/ n (r/) - f n 



since P(/i, /) increases with h, 



< 2 ^<x(/i / ,n) + ||/ n (/i+) - P/ n ,(/i + )|| 00 + \\Ef n (g) - f n 

< -^a(g,n) + \\f n (h+) - P/„(/i + )|U + \\Ef n (g) - Ug)^ 



since a(h, f) decreases with h, so that, 

p( \\fn(h + ) - f n (g)\\oo > VMa(g,n) 



< P[ ||/^ + )-^/^ + )IU + ||^/n(5)-/n(5)l|oc > VMvfan) - Safari) 



< P ( ||/n(^ + ) - P/n(^ + )||oo > ^a(/l+,7l)j + P ( || /„(«,) - £/ n ( 5 )||oo > 7l) 

< 4exp(— -fTlogre) 
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thanks to Proposition 3. Therefore, 



and 



P(K = h) < P (Wfn(h + )- f n (g)\\oo > VMa(g, 

g<h,gen V 

< 4exp(— A'logn) logn, 



n 



E\\UK) - /|| 



< 



< 



C2 



-y/logn 



C2 



J2 P{K = h)' +B(h f ,n) 



1/2 



heH,h<hf 



4exp(— K logn) logn I logn + B(hf, 



1/2 



V / logn 
= 0(a(h f ,n)) 

for K large enough, choosen independently of / or t, which completes the proof of Theorem 1. 



3.3 A concentration inequality 

The following proposition is a key result which allows us to prove both the honesty and adaptivity 
of C n in Section 2.3 as well as the existence of the adaptive estimators. Indeed, it gives an upper 
bound for the concentration of f n (h) around its expectation. The proof relies on an inequality 
due to Borell [1], which may be found in Theorem 1.7 in [11] in the form we use it. 

Theorem 4. Let G{x), x £ T, be a centered Gaussian process indexed by the countable set 
T, and such that sup xgr |G(x)| < oo almost surely. Then Eswp x( z T |G(x)| < oo, and for every 
r > we have 



P 



sup|G(x)| -Esup\G(x) 

xeT xeT 



> r < 2e~ r2 ^o 



where ctq = sup x£ TE(G 2 (x)) < oo. 

Proposition 3. Let (log n/n) 1 ' > h > log n/n and c\ and C2 the constants in Theorem 3. 

Let f n (h) be the local polynomial estimator of order I for f. Let G n = f n (h) — E(f n (h)). Then, 



al = E{G 2 n ) < 



2„2 



nh 



(^y /2 ^(l|Gn||oc>u)<2exp' 



nh 



2a Cl 



u 



( log n \ 
\ \ nh ) 



1/2 




(13) 
(14) 



Proof. Obviously, Equation 1 and Equation 9 imply that (G n (x), < x < 1), which satisfies for 
all < x < 1, 

G n (x) = f n {x) - E{f n {x)) 

n / n \ 

= E W m{x)Yi -E[J2 W ni (x)Yi 



i=l 



X Ei 



i=l 



is a gaussian process. 
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Recall the definition of f n and that the £j's are i.i.d. M(0, a 2 ). Then, 



4 



= sup E(G 2 n (x)) 
xe[o,i] 

= sup £ ^Ty n i(x)ei 
x6[0,l] \i=i / 

n 

= Slip £w£(s)ff 2 

xe[o,i] i=1 

n 

< o- 2 ||W n i||oo sup ^2\W ni (x) 
xe[o,i] i= i 



< 



a 2 c 2 
nh 



Then, we want to bound the probability 



P(||Gy|oo>«) = P(||Gn||oo--E||G„||oo>U-||Gn| 



< P 



P 



\G n \\oo E\\G n \ 



sup \G n (x) 
xe[o,i]nQ 



>u - \\G n \ 



E sup \G n (x) 
xe[o,i]nQ 



>u-\\G n \ 



since G n is continuous. Now, if u — ||G n ||oo > 0, we can apply Theorem 4 to (G n (x), x 6 
[0, 1] n Q) and write 

P(||G»||oo > u) < 2exp U U ~ E ^n\U 2 \ 



1 /2 

Finally, thanks to 11 and 13, for u > C2 ' 



^(IIGnlloo > u) < 2exp 



V 



logn 



2ac 



u 



1 I ( log n \ 



1/2 



C2 



In the proof of section 2.3 we only need the following result, obtained with h 
Corollary 1. Let C > 0. Take u = {pi + C)r n {r), then 

P(\\G n \\^>u)<2n- c2 / 2 ° c \ 



□ 

A l/(2r+l) 
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